The Challenges of Creating a Gold Standard for De-identification Research

نویسنده

  • Allen C. Browne
چکیده

We created a Gold Standard corpus comprised over 20,000 records of annotated narrative clinical reports for use in the training and evaluation of NLM Scrubber, a de-identification software system for medical records. Our experience with designing the corpus demonstrated the conceptual complexity of the task.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Preparing an annotated gold standard corpus to share with extramural investigators for de-identification research

OBJECTIVE The current study aims to fill the gap in available healthcare de-identification resources by creating a new sharable dataset with realistic Protected Health Information (PHI) without reducing the value of the data for de-identification research. By releasing the annotated gold standard corpus with Data Use Agreement we would like to encourage other Computational Linguists to experime...

متن کامل

Applicability of CBCT as a Substitute for the Gold-Standard Tooth Clearing Technique for Identification of Internal Anatomical Variations of Mandibular Incisors

Background and Aim: Knowledge about the root canal system variations is crucial for successful endodontic treatment. This study aimed to examine the applicability of cone-beam computed tomography (CBCT) as a substitute for the gold-standard tooth clearing technique in identification of internal anatomical variations of mandibular incisors. Materials and Methods: This in-vitro study evaluated 6...

متن کامل

De-identifying Swedish clinical text - refinement of a gold standard and experiments with Conditional random fields

BACKGROUND In order to perform research on the information contained in Electronic Patient Records (EPRs), access to the data itself is needed. This is often very difficult due to confidentiality regulations. The data sets need to be fully de-identified before they can be distributed to researchers. De-identification is a difficult task where the definitions of annotation classes are not self-e...

متن کامل

Chronic Endometritis: Old Problem, Novel Insights and Future Challenges

Background/Aims: Chronic endometritis (CE) is a poorly investigated pathology that has been related to adverse reproductive outcomes, such as implantation failure and recurrent miscarriage. In this paper we aim to provide an overview about diagnosis, etiology, pathophysiology and treatment of CE, its impact on endometrial microenvironment and how it may be associated with infertility. Methods: ...

متن کامل

CHALLENGES OF LABORATORY SAMPLING AND DIAGNOSIS OF SARS-COV-2 VIRUS OF DISEASE (COVID-19)

Background & Aims: Given the prevalence of SARS-CoV-2 worldwide, it is essential to identify people infected with the virus and determine its different types to control the global outbreak of COVID-19. The results of the studies are controversial, so this study examines these diagnostic challenges, including the types of diagnostic methods, the type and time of sampling, and even the clinical c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • AMIA ... Annual Symposium proceedings. AMIA Symposium

دوره 2014  شماره 

صفحات  -

تاریخ انتشار 2014